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We introduce a new method to price American-style options on 
underlying investments governed by stochastic volatility (SV) mod- 
els. The method does not require the volatility process to be ob- 
. served. Instead, it exploits the fact that the optimal decision func- 

tions in the corresponding dynamic programming problem can be 
expressed as functions of conditional distributions of volatility, given 
Qh ■ observed data. By constructing statistics summarizing information 

Q.,} 1 about these conditional distributions, one can obtain high quality 

approximate solutions. Although the required conditional distribu- 
tions are in general intractable, they can be arbitrarily precisely ap- 
proximated using sequential Monte Carlo schemes. The drawback, 
Q" 1 as with many Monte Carlo schemes, is potentially heavy computa- 

tional demand. We present two variants of the algorithm, one closely 
related to the well-known least-squares Monte Carlo algorithm of 
Longstaff and Schwartz [The Review of Financial Studies 14 (2001) 
1 113-147], and the other solving the same problem using a "brute 

, force" gridding approach. We estimate an illustrative SV model us- 

£f) ■ ing Markov chain Monte Carlo (MCMC) methods for three equities. 

We also demonstrate the use of our algorithm by estimating the pos- 
terior distribution of the market price of volatility risk for each of the 
three equities. 



(N 
> 



O 

o 



x 



1. Introduction. American-style option contracts are traded extensively 
over several exchanges. These early-exercise financial derivatives are typ- 
ically written on equity stocks, foreign currency and some indices, and 
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include, among other examples, options on individual equities traded on 
The American Stock Exchange (AMEX), options on currency traded on the 
Philadelphia Stock Exchange (PHLX) and the OEX index options on the 
S&P 100 Index traded on the Chicago Board Options Exchange (CBOE). As 
with any other kind of option, methods for pricing are based on assumptions 
about the probabilistic model governing the evolution of the underlying as- 
set. Arguably, stochastic volatility models are the most realistic models to 
date for underlying equities, but existing methods for pricing American-style 
options have mostly been developed using less realistic models, or else as- 
suming that volatility is observable. In this paper we develop a new method 
for pricing American-style options when the underlying process is governed 
by a stochastic volatility model, and the volatility is not directly observable. 
The method yields near-optimal solutions under the model assumptions, 
and can also formally take into account the market price of volatility risk 
(or volatility risk premium). 

It follows from the fundamental theorem of arbitrage that an option price 
can be determined by computing the discounted expectation of the payoff of 
the option under a risk-neutral measure, assuming that the exercise decision 
is made so as to maximize the payoff. While this is simple to compute for 
European-style options, as illustrated in the celebrated papers of Black and 
Scholes (1973) and Merton (1973), the pricing problem is enormously more 
difficult for American-style options, due to the possibility of early exercise. 
For American-style options, the price is in fact the supremum over a large 
range of possible stopping times of the discounted expected payoff under a 
risk-neutral measure. A range of methods has been developed to find this 
price, or equivalently, to solve a corresponding stochastic dynamic program- 
ming problem. Glasserman (2004) provides a thorough review of American- 
style option pricing with a strong emphasis on Monte Carlo simulation-based 
procedures. Due to the difficulty of the problem, certain assumptions are usu- 
ally made. For instance, a number of effective algorithms [including those 
developed in Brennan and Schwartz (1977), Broadie and Glasserman (1997), 
Carr, Jarrow, and Myneni (1992), Geske and Johnson (1984), Longstaff and 
Schwartz (2001), Rogers (2002), and Sullivan (2000)] are based on the as- 
sumption that the underlying asset price is governed by a univariate diffusion 
process with a constant and/or directly observable volatility process. 

As recognized by Black and Scholes (1973) and others, the assumption of 
constant volatility is typically inconsistent with observed data. The volatil- 
ity "smile" (or "smirk") is one example where empirical data show evidence 
against constant volatility models. The smile (smirk) effect arises when op- 
tion contracts with different strike prices, all other contract features being 
equivalent, result in different implied volatilities (i.e., the volatility required 
to calibrate to market observed option prices). A variety of more realistic 
models has subsequently been developed for asset prices, with stochastic 
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volatility models arguably representing the best models to date. This has 
led researchers to develop pricing methods for European-style options when 
the underlying asset price is governed by stochastic volatility models [e.g., 
Fouque, Papanicolaou, and Sircar (2000), Heston (1993), Hull and White 
(1987) and Stein and Stein (1991)]. However, work on pricing of American- 
style options under stochastic volatility models is far less developed. A num- 
ber of authors [including Clarke and Parrott (1999), Finucane and Tomas 
(1997), Fouque, Papanicolaou, and Sircar (2000), Guan and Guo (2000), 
Tzavalis and Wang (2003) and Zhang and Lim (2006)] have made valu- 
able inroads in addressing this problem, but most assume that volatility is 
observable. Fouque, Papanicolaou, and Sircar (2000) provide an approxima- 
tion scheme based on the assumption of fast mean-reversion in the volatility 
process, and they use a clever asymptotic expansion method to correct the 
constant volatility option price to account for stochastic volatility. The cor- 
rection involves parameters estimated from the implied volatility surface and 
they derive a pricing equation that does not depend directly on the volatility 
process. Tzavalis and Wang (2003) use an analytic-based approach whereby 
they compute the optimal stopping boundary using Chebyshev polynomials 
to value American-style options in a stochastic volatility framework. They 
derive an integral representation of the option price that depends on both 
the share price and level of volatility. 

Additionally, the approach in Clarke and Parrott (1999) uses a multi- 
grid technique where both the asset price and volatility are state variables 
in a two-dimensional parabolic partial differential equation (PDE). Pricing 
options in a stochastic volatility framework using PDE methods are feasible 
once we assume that volatility itself is an observed state variable. Typically, 
there is a grid in one dimension for the share price and another dimension for 
the volatility. The final option price is a function of both the share price and 
volatility. Guan and Guo (2000) derive a lattice-based solution to pricing 
American options with stochastic volatility. Following the work in Finucane 
and Tomas (1997), they construct a lattice that depends directly on the 
asset price and volatility and they illustrate an empirical study where they 
back out the parameters from the stochastic volatility model using data on 
American-style S&P 500 futures options. The valuation algorithm that they 
develop, however, involves an explicit dependence on both the share price 
and volatility state variables. Additionally, Zhang and Lim (2006) propose 
a valuation approach that is based on a decomposition of American option 
prices, however, volatility is a variable in the pricing result. 

The approach we develop, in contrast with the aforementioned methods, 
combines the optimal decision-making problem with the volatility estima- 
tion problem. We assume that the asset price follows a stochastic volatility 
model, that observations are made at discrete points in time t = 0, 1, 2, . . . , 
and that exercise decisions are made immediately after each observation. 
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It could be argued that volatility should be considered observable since one 
could simply compute "Black and Scholes type" implied volatilities from ob- 
served option prices. However, implied volatilities are based on the assump- 
tion of a simple geometric Brownian motion model (or some other simplified 
diffusion process) and, thus, their use would defeat the purpose of devel- 
oping pricing methods with more realistic models. The implied volatility 
calculation is not as straightforward in a stochastic volatility (multivariate) 
modeling framework. Renault and Touzi (1996) approximate a Black and 
Scholes type implied volatility quantity in a Hull and White (1987) setting. 
The analysis in Renault and Touzi (1996) computes filtered volatilities using 
an iterative approach and illustrates applications to hedging problems. On 
the other hand, we aim to compute the posterior distribution of volatility 
conditional on observed data. Our pricing scheme is based on two key obser- 
vations. First, we use a sequential Monte Carlo (also referred to as "particle 
filtering" ) scheme to perform inference on the unobserved volatility process 
at any given point in time. Second, conditional distributions of the unob- 
served volatility at a given point in time, given current and past observations 
of the price process, which are necessary for finding an exact solution to the 
dynamic programming problem, can be well approximated by a summary 
vector or by a low-dimensional parametric family of distributions. 

Inference on the latent volatility process for the purpose of option pricing 
is an application of the more general methodology that addresses partially 
observed time series models in a dynamic programming/optimal control set- 
ting [see, e.g., Bertsekas (2005, 2007) and the references therein]. Among 
earlier work, Sorenson and Stubberud (1968) provide a method based on 
Edgeworth expansions to estimate the posterior density of the latent process 
in a nonlinear, non-Gaussian state-space modeling framework. Our objective 
in this paper is to illustrate an algorithm that allows an agent (a holder of 
an American option) to optimally decide the exercise time assuming that 
both the share price and volatility are stochastic variables. In our partially 
observed setting, only the share price is observable; volatility is a latent 
process. The main challenge for the case of American-style options is deter- 
mining the continuation value so that an optimal exercise/hold decision can 
be made at each time point. 

Several researchers have applied Monte Carlo methods to solve the Amer- 
ican option pricing problem. Most notable among these include Longstaff 
and Schwartz (2001), Tsitsiklis and Van Roy (2001), Broadie and Glasser- 
man (1997) and Carriere (1996). An excellent summary of the work done on 
Monte Carlo methods and American option pricing is presented in Chapter 
8 of Glasserman (2004). The least -squares Monte Carlo (LSM) algorithm of 
Longstaff and Schwartz (2001) has achieved much popularity because of its 
intuitive regression-based approach to American option pricing. The LSM 
algorithm is very efficient to price American-style options since as long as 
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one could simulate observations from the pricing model, then a regression- 
based procedure could be employed along with the dynamic programming 
algorithm to price the options. Pricing American-style options in a stochastic 
volatility framework is straightforward using the methodology of Longstaff 
and Schwartz (2001) as long as draws from the share price and volatility 
processes can be obtained. That is, at each time point, n, the decision of 
whether or not to exercise an American option will be a function of the 
time n share price, S n (and possibly some part of its recent history S" n _i, 
S n -2,---), and the time n volatility, a n . Our pricing framework, however, 
needs to accommodate a latent volatility process. 

We propose to combine the Longstaff and Schwartz (2001) idea with a 
sequential Monte Carlo step whereby at each time point n, we estimate 
the conditional (posterior) distribution of the latent volatility given the ob- 
served share price data up to that time. We propose a Monte Carlo based 
approach that uses a summary vector to capture the key features of this 
conditional distribution. As an alternative, we also explore a grid-based ap- 
proach, studied in Rambharat (2005), whereby we propose a parametric 
approximation to the conditional distribution that is characterized by the 
summary vector. Therefore, our exercise decision at a given time point is 
based on the observed share price and the computed summary vector com- 
ponents at that time. Although our pricing approach is computationally 
intensive, as it combines nonlinear filtering with the early-exercise feature 
of American-style option valuation, it provides a way to solve the associated 
optimal stopping problem in the presence of a latent stochastic process. We 
compare our approach to a basic LSM method whereby at a given time 
point, we use a few past observations to make the exercise decision in order 
to price American-style options in a stochastic volatility framework. We also 
compare our method to the LSM approach using the current share price and 
an estimate of the current realized volatility as a proxy for the true volatil- 
ity. In order to assess precision, we compare LSM with past share prices, 
LSM with realized volatility, and our proposed valuation technique to the 
LSM-based American option price assuming that share price and volatility 
can be observed (the full information state). The method closest to the full 
information state benchmark would be deemed the most accurate. 

The present analysis addresses the problem of pricing an American-style 
option, once a good model has already been found. It is worth noting, how- 
ever, that since neither the sequential Monte Carlo scheme nor the gridding 
approach used in our pricing technique are tied to a particular model, the 
method is generalizable in a straightforward manner to handle a fairly wide 
range of stochastic volatility models. Thus, it could be used to perform op- 
tion pricing under a range of variants of stochastic volatility models, such as 
those discussed in Chernov et al. (2003). Although the focus of our paper is 
not model estimation/selection methodology, we do implement a thorough 
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statistical exercise using share price history to estimate model parameters 
from a stochastic volatility model. [See Chernov and Ghysels (2000), Eraker 
(2004), Gallant, Hsieh, and Tauchen (1997), Ghysels, Harvey, and Renault 
(1996), Jacquier, Poison, and Rossi (1994), Kim, Shephard, and Chib (1998) 
and Pan (2002) for examples of work that mainly address model estima- 
tion/selection issues]. Our approach will be to employ a sequential Monte 
Carlo based procedure to estimate the log-likelihood of our model [see, e.g., 
Kitagawa and Sato (2001) and the references therein] and then use a Markov 
chain Monte Carlo (MCMC) sampler to obtain posterior distributions of 
all model parameters. Conditional on a posterior summary measure of the 
model parameters (such as the mean or median), we then estimate the (ap- 
proximate) posterior distribution of the market price of volatility risk. Our 
analysis is illustrated in the context of three equities (Dell Inc., The Walt 
Disney Company and Xerox Corporation). 

The paper is organized as follows. In Section 2 we formally state the class 
of stochastic volatility models with which we work. In Section 3 we review 
the dynamic programming approach for pricing American-style options and 
demonstrate how it can be transformed into an equivalent form, and in- 
troduce (i) a sequential Monte Carlo scheme that yields certain conditional 
distributions, and (ii) a gridding algorithm that makes use of the sequen- 
tial Monte Carlo scheme to compute option prices. Section 4 describes the 
pricing algorithms and presents some illustrative numerical experiments. In 
Section 5 we describe (i) the MCMC estimation procedure for the stochastic 
volatility model parameters, and (ii) the inferential analysis of the market 
price of volatility risk. Section 6 contains posterior results of our empirical 
analysis with observed market data. Section 7 provides concluding remarks. 
Finally, we present additional technical details in the Appendix. We also 
state references to our computing code and data sets in supplementary. 

2. The stochastic volatility model. Let (Q, J 7 , P) be a probability space, 
and let {S(t), t > 0} be a stochastic process defined on (fi, J 7 , P), describing 
the evolution of our asset price over time. Time t = will be referred to 
as the "current" time, and we will be interested in an option with expiry 
time (TA) > 0, with T being some positive integer, and A some positive 
real- valued constant. Assume that we observe the process only at the dis- 
crete time points t = 0, A, 2A, . . . , TA, and that exercise decisions are made 
immediately after each observation. (We would typically take the time unit 
here to be one year, and A to be 1/252, representing one trading day. How- 
ever, both A and the time units can be chosen arbitrarily subject to the 
constraints mentioned above.) 
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Assume that, under a risk-neutral measure, the asset price S(t) evolves 
according to the Ito stochastic differential equations (SDEs) 3 



(1) dS(t) = rS(t) dt + a{Y{t))S{t)[^/l- P 2 dW 1 (t) + pdW 2 (t)}, 

(2) a(Y(t)) = exp(Y(t)), 



(3) dY(t) 



a[/3-^-Y(t) 
a 



dt + -fdW 2 (t), 



where r represents the risk-free interest rate (measured in appropriate time 
units), a(Y(t)) is referred to as the "volatility," p measures the co-dependence 
between the share price and volatility processes, a (volatility mean reversion 
rate), (3 (volatility mean reversion level), and 7 (volatility of volatility) are 
constants with a > 0, 7 > 0, {Wi(i)} and {W2(i)} are assumed to be two 
independent standard Brownian motions, and A is a constant referred to as 
the "market price of volatility risk" or "volatility risk premium" [Bakshi and 
Kapadia (2003a), Melenberg and Werker (2001) and Musiela and Rutkowski 
(1998)]. If we set 



dW?(t) = p 2 dW 1 (t) + pdW 2 (t)], 

we can more clearly see that p is the correlation between the Brownian mo- 
tions dW*(t) and dW 2 (t). The parameter p quantifies the so-called "leverage 
effect" between share prices and their volatility. 

Observe that A is not uniquely determined in the above system of SDEs. 
Since we are working in a stochastic volatility modeling framework, markets 
are said to be "incomplete" because volatility is not a traded asset and can- 
not be perfectly hedged. This is to be contrasted with the constant volatility 
Black and Scholes (1973) framework where a unique pricing measure exists 
and all risks can be perfectly hedged away. There is a range of possible 
risk-neutral measures when pricing under stochastic volatility models, each 
one having a different value of A. In fact, there are also risk-neutral mea- 
sures under which A varies over time. However, for the sake of simplicity, we 
will assume that A is a constant. Later in this paper, we illustrate how to 
estimate A. 

Equations (l)-(3) represent a stochastic volatility model that accommo- 
dates mean-reversion in volatility and we will use it to illustrate our Amer- 
ican option valuation methodology. It is an example of a nonlinear, non- 
Gaussian state-space model. Scott (1987) represents one of the earlier anal- 
yses of this stochastic volatility model. This same type of model has also 



3 Under the statistical (or real-world) measure, the asset price evolves on another prob- 
ability space. Under the real-world measure, the drift term r in equation (1) is replaced by 
the physical drift and the term — does not appear in the drift of equation (3). The change 
of measure between real-world and risk-neutral is formalized through a Radon-Nikodym 
derivative. 
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been studied in a Bayesian context by Jacquier, Poison, and Rossi (1994) 
and, more recently, in Jacquier, Poison, and Rossi (2004) and Yu (2005). 
It should be noted that our methodology is not constrained to a specific 
stochastic volatility model. The core elements of our approach would apply 
over a broad spectrum of stochastic volatility models such as, for instance, 
the Hull and White (1987) and Heston (1993) stochastic volatility models. 

Since our observations occur at discrete time-points 0, A, 2A, . . . , we will 
make extensive use of the discrete-time approximation to the solution of the 
risk-neutral stochastic differential equations (l)-(3) given by 

(4) S t+1 = S t ■ expj (r - < ^f\ A + a t+1 VA[y/l - p 2 Z w + pZ 2 , t+1 }\, 

(5) a t+ i =exp(Y t+1 ), 



e -2aA 



(6) Y t+1 =(3* + e~ (Yt -n + l\\{ Ya ) Z2 ' t+U 

where {Zij}, i = 1,2, is an independent and identically distributed (i.i.d.) 
sequence of random variables with standard normal [iV(0, 1)] distributions, 

a 

and all other parameters are as previously defined. Thus, St and Yt represent 
approximations, respectively, to S(tA) and Y(tA). [The expression for Yj is 
obtained directly from the exact solution to (3), while the expression for St 
is the solution to (1) that one would obtain by regarding at to be constant 
on successive intervals of length A. The approximation for St becomes more 
accurate as A becomes smaller.] 

It will sometimes be convenient to express (4) in terms of the log-returns 
R t+ i = \og(S t +i/S t ), as 



(7) 



Rt+i = (r- ^\ A + a t+1 VA(y/l-pPZ 1>t+1 + P Z 2 ,t+i] 



To complete the specification of the model, we can assign Yq a normal dis- 
tribution, 

(8) Yq ~n(p;£ 

This is simply the stationary (limiting) distribution of the first-order au- 
toregressive process {Y t }. However, for practical purposes, it will usually be 
preferable to replace this distribution by the conditional distribution of Yq, 
given some historical observed price data S-\,S~2, ■ ■ ■ ■ Another reasonable 
starting value for Yq is a historical volatility based measure (i.e., the log of 
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the standard deviation of a few past observations). Additionally, there are 
examples of stochastic volatility models where exact simulation is not feasi- 
ble. In such cases, we must resort to numerical approximation schemes such 
as the Euler-Maruyama method (or any other related approach). Kloeden 
and Platen (2000) illustrate several numerical approximation schemes that 
could be applied to simulate from a stochastic volatility model where no 
exact simulation methodology exists. 

3. Dynamic programming and option pricing. The arbitrage-free price 
of an American-style option is 



where r is a random stopping time at which an exercise decision is made, 
T is the set of all possible stopping times with respect to the filtration 
{Ft, t = 0, 1, . . .} defined by 



Ekn(-) represents the expectation, under a risk-neutral probability measure, 
of its argument, and g{s) denotes the payoff from exercise of the option 
when the underlying asset price is equal to s. For example, a call option 
with strike price K has payoff function g(s) = max(s — K, 0), and a put 
option with strike price K has payoff function g(s) = max(K — s,0). [The 
analysis in this paper is in the context of American put options since these 
options typically serve as canonical examples of early-exercise derivatives; 
see Karatzas and Shreve (1991, 1998) and Myneni (1992) for key mathemat- 
ical results concerning American put options.] Since r is a stopping time, 
the event {r < t} must be JVmeasurable, or equivalently, the decision to 
exercise or hold at a given time must be made only based on observations 
of the previous and current values of the underlying price process. To allow 
for the possibility that the option is never exercised, we adopt the conven- 
tion that t = oo if the option is not exercised at or before expiry, along 
with the convention that ex.p(—roo)g(S 00 ) = 0. In order to price the Amer- 
ican option, we need to find the stopping time r at which the supremum 
in (9) is achieved. While it is not immediately obvious how one might search 
through the space of all possible stopping times, this problem is equivalent 
to a stochastic control problem, which can be solved (in theory) using the 
dynamic programming algorithm, which was originally developed by Bell- 
man (1953). Ross (1983) also describes some key principles of stochastic 
dynamic programming and a thorough treatment is presented, in the con- 
text of financial analysis, by Glasserman (2004). 

Our objective is to find the optimal stopping rule (equivalently, the opti- 
mal exercise time) while taking into account the latent stochastic volatility 



(9) 



sup£ , RN [exp(-rr)ff(5 T )] 
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process. The key difficulty arises because the price itself is not Markovian. 
We would like to get around this by using the fact that the bivariate process, 
composed of both price and volatility, is Markovian, but unfortunately we 
can only observe one component of that process. It is known that one can 
still find the optimal stopping rule if one can determine (exactly) the con- 
ditional distribution of the unobservable component of the Markov process, 
given current and past observable components [see, e.g., DeGroot (1970) 
and Ross (1983)]. In such a case, we can use algorithms like the ones de- 
scribed in Brockwell and Kadane (2003) to find the optimal decision rules. 
These algorithms effectively integrate the utility function at each point in 
time over the distribution of unknown quantities given observed quantities. 
Unfortunately, in the context of this paper, we cannot obtain the required 
conditional distributions exactly, but we can find close approximations to 
them. We will therefore approach our pricing problem by using numerical 
algorithms in the style of Brockwell and Kadane (2003), in conjunction with 
close approximations to the required distributions. In doing so, we make the 
assumption (stated later in this paper) that our distributional approxima- 
tions are close to the required conditional distributions. 

3.1. General method. The dynamic programming algorithm constructs 
the exact optimal decision functions recursively, working its way from the 
terminal decision point (at time TA) back to the first possible decision 
point (at time 0). In addition, the procedure yields the expectation in the 
expression (9), which is our desired option price. The algorithm works as 
follows. 

Let dt £ {E, H} denote the decision made immediately after observation of 
St, either to exercise (E) or hold (H) the option. While either decision could 
be made, only one is optimal, given available information up to time t. (In 
the event that both are optimal, we will assume that the agent will exercise 
the option.) We denote the optimal decision, as a function of the available 
observations, by 



Here and in the remainder of the paper, we adopt the usual convention of 
using Sj (upper case) to denote the random variable representing the equity 
price at time j, and Sj (lower case) to denote a particular possible realization 
of the random variable. Next, let 



and for t = 0, 1, . . . ,T — 1, let ut(so, . . . ,st,dt) denote the discounted ex- 
pected payoff of the option at time (tA), assuming that decision dt is made, 



<%(s ,...,s t )e{E,H}. 



(10) 




d T = E, 
d T = H, 
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and also assuming that optimal decisions are made at times (t + 1) A, (t + 
2) A, . . . ,TA. It is obvious that, at the expiration time, 

d^(s , . . . , s T ) = argmax u T (s , . . . , s T , d T ). 

d T e{E,H} 

The optimal decision functions d T _^ . . . ,d$ can then be obtained by defining 
(11) u* t (s ,...,st) = ut{s ,...,st,d* t (s ,...,s t )), t = 0,...,T, 
and using the recursions 
u t (so,...,s t ,d t ) 

(12) 

= {g(st), (k = E, 

\ Er^(u^ +1 (S , . . .,S t+ i)\S = s , • • • ,S t = s t ), d t = H, 



(13) dl(s ,...,s t ) = argmax u t (s , ■■■ ,s t ,d t ). 

d t e{E,H} 

These recursions are used sequentially, for t = T — 1,T — 2,...,0, and yield 
the (exact) optimal decision functions (fjf , t = 0, .. . ,T. (Each dt is optimal 
in the space of all possible functions of historical data SQ,...,st-) The cor- 
responding stopping time r is simply 

r = min({i G {0, . . . ,T}\d t = E} U {oo}). 

Furthermore, the procedure also gives the risk-neutral option price, since 

Uq(s ) = sup^ RN [exp(-rr)5t(5 r )]. 

In practice, it is generally not possible to compute the optimal decision func- 
tions, since each d\ needs to be computed and stored for all (infinitely many) 
possible combinations of values of its arguments so, • • • , St- However, in what 
follows we will develop an approach which gives high-quality approximations 
to the exact solution. The approach relies on exploiting some key features 
of the American-style option pricing problem. 

3.2. Equivalent formulation of the dynamic 'programming "problem. First, 
it follows from the Markov property of the bivariate process {(St,Yt)} that 
we can transform the arguments of the decision functions so that they do 
not increase in number as t increases. Let us define 

(14) 7Tt(yt)dt = P(Y t £dyt\So = so,...,St = st), t = 0,...,T, 

where Sj denotes the observed value of Sj, so that 7r 4 (-) denotes the condi- 
tional density of the distribution of Yj (with respect to the Lebesgue mea- 
sure) , given historical information Sq = so , . . . , St = St ■ Then we have the 
following result. 
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Lemma 3.1. For each t = 0, ... ,T, u t (so, ... ,st,dt) can be expressed as 
a functional of only St, vrj and dt, that is, 

ut{s ,...,s t ,d t ) =u t (s t ,-Kt,dt). 

Consequently, for each t = 0, . . . , T, d%(so, . . . ,st) and u% (so, • • • , St) can also 
be expressed, respectively, as functionals 

d${s ,...,S t ) =d*(s t ,7T t ), 

n*(s ,...,Si) = n*(s t ,7r t ). 

This is a special case of the well-known result [see, e.g., Bertsekas (2005, 
2007)] on the sufficiency of filtering distributions in optimal control prob- 
lems. A proof is given in the Appendix. It is important to note here that the 
argument 7Tf to the functions ut, d\ and u\ is a function itself. 

Lemma 3.1 states that each optimal decision function can be expressed 
as a functional depending only on the current price St and the conditional 
distribution of If (the latent process that drives volatility) given observations 
of prices sq, ■ ■ ■ ,st- In other words, we can write the exact equivalent form 
of the dynamic programming recursions, 



(15) Mst,^ T )={$^ %\l 

(16) u t (s t ,7r t ,dt) 



(g{st), d t = E, 

\ E RN [u* +1 (S t+ l,TTt + l)\St = S t ,7T t ], d t = H, 

where <%{s u n t ) = argmax dtg{£ ; jH | u t (s t , n, d t ), and u$(s t ,n) = ut(s t ,ir t , 
d* t (s t ,TTt))- 

3.3. Summary vectors and sequential Monte Carlo. In order to imple- 
ment the ideal dynamic programming algorithm as laid out in Section 3.2, 
we would need (among other things) to be able to determine the filtering 
distributions 7it(-), t = 0, 1, ...,T. Unfortunately, since these distributions 
are themselves infinite-dimensional quantities in our modeling framework, 
we cannot work directly with them. We can, however, recast the dynamic 
programming problem in terms of ^-dimensional summary vectors 

(17) Q t = \ 

where /i(-)> • • • , //(•) are some functionals. The algorithms we introduce can 
be used with any choice of these functionals, but it is important that they 
capture key features of the distribution. As typical choices, one can use the 
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moments of the distribution. In the examples in this paper, we take 1 = 2, 
fii^t) — E[7r t ] = f X7r t (x)dx and /^(tr) = std.dev.(7Tt). Adding components 
to this vector typically provides more comprehensive summaries of the re- 
quired distributions, but incurs a computational cost in the algorithms. 

Ultimately, we need to be able to compute 7r t (-) and thus Qt- One way 
to do this is with the use of sequential Monte Carlo simulation [also known 
as "particle filtering," see, e.g., Doucet, de Freitas, and Gordon (2001), for 
discussion, analysis and examples of these algorithms]. The approach yields 
samples from (arbitrarily close approximations to) the distributions 7T£, and 
thus allows us to evaluate components of Qt- A full treatment of sequential 
Monte Carlo methods is beyond the scope of this paper, but the most basic 
form of the method, in this context, appears in Algorithm 1. 

This algorithm yields T + 1 collections of particles, {y^\ ■ ■ ■ ,Vt }i t = 
0, 1, . . . , T, with the property that for each t, {y^ , . . . , y^} can be regarded 
as an approximate sample of size m from the distribution nt . The algorithm 
has the convenient property that as m increases, the empirical distributions 
of the particle collections converge to the desired distributions nt- Typically 
m is chosen to be as large as possible subject to computational constraints; 
for the algorithms in this paper, we choose m to be around 500. There 
are several ways to improve the sampling efficiency of Algorithm 1, namely, 



Algorithm 1 Sequential Monte Carlo estimation of 7To , • • ■ , txt 

Initialization (t = 0). Choose a number of "particles" m > 0. Draw a sam- 
ple {y^\ ■ ■ ■ ,^0^} from the distribution of Yq [see equation (8)]. 
for t = l,...,T do 

• Step 1: Forward simulation. For i = 1, 2, . . . , m, draw y\ from the distri- 
bution p(y t \Y t -i = j/^-i)- [This distribution is Gaussian in our example, 
specified by (6).] 

• Step 2: Weighting. Compute the weights 

(18) w f) = p (r t \Y t = yf\ 

where the term on the right denotes the conditional density of the log- 
return R t , given Y t = y^\ evaluated at the observed value r t . [These 
weights are readily obtained from (7).] 

• Step 3: Resampling. Draw a new sample {y^\ . . . ,yl m ^} by sampling 

with replacement from {y^\ ■ ■ ■ ,y^ m ^}, with probabilities proportional 

. (1) (m) 
to W t . 

end for 
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ensuring that the weights in equation (18) do not lead to degeneration of 
the particles. [For more details and various refinements of this algorithm, 
refer to, among others, Kitagawa and Sato (2001), Liu and West (2001) and 
Pitt and Shephard (1999).] 

3.4. Approximate dynamic programming solution. In order to motivate 
our proposed American option pricing methodology, we state an assumption 
that describes how Qt relates to past price history captured in ir t . 

Assumption 3.2. The summary vector Q t is "close to sufficient," that 
is, it captures enough information from the past share price history 
(S t ,S t -i,...), so that p(S t +i\Qt) is close to p(S t +i\St,St-i,S t -2,---)- 

(If the summary vector was sufficient, then the dynamic programming 
algorithm would yield exact optimal decision rules. Of course, even in this 
ideal case, the numerical implementation of a dynamic programming algo- 
rithm introduces some small errors.) 

It is beyond the scope of this paper to quantify the "closeness" between 
p(St+i\Qt) and p(St+i\St,St-i, St-2, ■ ■ ■)■ One could, however, theoretically 
do so using standard distribution distance measures and perform an analysis 
of the propagation of the error through the algorithms discussed in this 
paper. 

Combining the equivalent form of the dynamic programming recursions (15), 
(16) along with Assumption 3.2, we can approximate the dynamic program- 
ming recursions by 



(19) 




d T = E, 
d T = H, 



Ut(s t ,Qt,dt) 



(20) 




d t = E, 
d t = H, 



with 





d t £{E,H} 



and 



(22) 




where Qt is the vector summarizing 7Tt(-) [cf. equation (17)]. The recur- 
sion (20) is convenient because the required expectation can be approxi- 
mated using the core of the sequential Monte Carlo update algorithm. 
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Assumption 3.2 motivates a practical, approximate solution to the ideal 
formulation of the dynamic programming problem described in Section 3.2. 
In the special case of linear Gaussian state-space models, the vector Qt would 
form a sufficient statistic since 7Tj would be Gaussian, and our approach 
would reduce to the standard Kalman filtering procedure [see Chapter 12 
of Brockwell and Davis (1991) for details]. For the case of nonlinear, non- 
Gaussian state-space models, such as the illustrative one in this paper, ir t is 
not summarized by a finite-dimensional sufficient statistic. Assumption 3.2 
permits an approximate solution to our American option pricing problem 
and, in particular, motivates the conditional expectation in equation (20) by 
using Qt to summarize information about the latent volatility using the past 
price history. We next introduce two algorithms for pricing American-style 
options, making use of the summary vectors Qt described in Section 3.3, 
and we illustrate how our algorithms perform through a series of numerical 
experiments. 

4. Pricing algorithms. 

4.1. A least-squares Monte Carlo based approach. The popular least- 
squares Monte Carlo (LSM) algorithm of Longstaff and Schwartz (2001) 
relies essentially on approximating the conditional expectation in (12) by 
a regression function, in which one or several recent values St,St-i, ■ ■ ■ are 
used as explanatory variables. Since the conditional expectation is in fact 
(exactly) a functional of the filtering distribution m, we might expect to ob- 
tain some improvement by performing the regression using summary features 
of 7Tf as covariates instead. This variant of the LSM algorithm, describing the 
simulation component of our pricing methodology, is stated in Algorithm 2. 

Remark. For the stochastic volatility model used in this analysis, mea- 
sures of center and spread will suffice to capture the key features of the 
distribution. Therefore, our summary vector Qt = (//*,Ct) m Algorithm 2 
describes the mean and standard deviation of the filtering distribution. 

Remark. The summary vector in Algorithm 2 can include as many key 
measures of the filtering distribution nt{yt) as needed to accurately describe 
it. Other types of stochastic volatility models may require additional mea- 
sures that capture potential skewness, kurtosis or modalities in the filtering 
distribution. One can learn of the need for such measures by doing an em- 
pirical analysis on historical data using Algorithm 1 to gain insights into the 
behavior of the filtering distribution irt(yt)- 

Next, we illustrate in Algorithm 3 the implementation of the LSM regres- 
sion step using our summary vector Qt of 7Tj. 
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Algorithm 2 Preliminary simulation of trajectories 
for n = 1, . . . , N do 

Step 1. Simulate a share price path {Sq U \ . . . , S^} with Sq^ = so from 
the risk-neutral stochastic volatility model [equations (4)-(6)]. 

Step 2. Apply Algorithm 1 (sequential Monte Carlo algorithm) replac- 
ing {Sq, . . . , St} by the simulated path {S^, . . . , S^} to obtain approx- 
imations to the filtering distributions {nt, t = 0, . .. , T} for the simulated 
path. 

Step 3. Use the estimate of the filtering distribution computed above 
in Step 2 to construct a summary vector, Q n , of vr n (y n ) that stores key 
measures of the filtering distribution such as the mean, standard deviation, 
skew, etc. 

Step 4- Store the vector (S n ,Q n ). 



Repetition. Repeat above steps to create M independent paths that con- 
tain information on the simulated share prices St and the summary vector 
Qt for all time points t = 0, 1, 2, . . . ,N. 



Remark. The regression in Step 3 of Algorithm 3 uses basis functions 
of the share price St and the summary vector Qt to form the explanatory 
variables. We choose Laguerre functions as basis functions. Our summary 
vector Qt = (^tXt) consists of the mean and standard deviation of the fil- 
tering distribution irt(yt)- The design matrix used in the regression at time 
point k consists of the first two Laguerre functions in Sk, [Xk and ( k , and a 
few cross-terms of these covariates. Specifically, our covariates used in the 
regression at time k (in addition to the intercept term) are 

Lo(Sk), Li(Sk), L (fi k ), Li(fi k ), ^o(Cfc)) ^i(Cfc)) 

L {S k ) * L (fj, k ), L (S k ) * L Q (Ck), Li(Sh) *L 1 (n k ), 

Li(S'fc) *Li(0fc), L (fi k ) *L ((k), Lx(fi k )*L 1 (Ck), 

where Lq{x) = e~ x / 2 and L\(x) = e _X//2 (l — x) and, in general, L n (x) = 
e~ x ^ 2 ^-^:(x n e~ x ). Other choices of basis functions, such as Hermite poly- 
nomials or Chebyshev polynomials, are also reasonable alternatives that 
could be used to implement the least-squares Monte Carlo algorithm of 
Longstaff and Schwartz (2001). 

Remark. Longstaff and Schwartz (2001) actually adjust Step 4 of Al- 
gorithm 3 as follows: 



end for 




d t = E, 
d t = H 
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Algorithm 3 The least-squares Monte Carlo algorithm of Longstaff and 
Schwartz (2001) and the dynamic programming step 
Initialization 

Sub-step A. Run Algorithm 2 to obtain M independent paths, where 
each path simulates realizations of the share price St and the summary 
vector Qt for time points t = 1, 2, . . . ,N. 

Sub-step B. Compute the option price at t = N along each of the 
M paths by evaluating the payoff function g^Sx), resulting in M option 
values {u?pi, ■■■ M }. 
for t = JV-l,JV-2,...,l do 

Step 1. Evaluate the exercise value g(S^) for i = 1, . . . , M. 

(i) (i) 

Step 2. Compute basis functions of S\ and Q\ for i = 1, . . . , M. 
Step 3. Approximate the hold value of the option at time t by 

v 

Eiw[ut + i(St + i,Qt+i)\St = s t ,Q t = qt] ~y~]Ptk<j>k(S t ,Qt), 

k=l 

where the (3tk are the coefficients of a regression (with p explanatory vari- 
ables) of the discounted time t + 1 American option prices, ul +l , on basis 
functions 4>k of St and Q t . 

Step 4- For i = 1, . . . , M, compute the exercise/hold decision according to, 
u t {Sf\Qf\dt) 

= {g{st ] ), d t = E, 

\ E m [u* t+1 (St +1 ,Q t+ i)\sP = s?\Q? ) = qf\ alt = H, 

end for 

Average the option values over all M paths to compute a Monte Carlo 
estimate and standard error of the American option price. 



in order to avoid computing American option prices with a slight upward 
bias due to Jensen's inequality; we follow their recommendation and use this 
adjustment. They also suggest using only paths where g(St) > (i.e., the 
"in-the-money" paths) as a numerical improvement. However, we could use 
all paths since the convergence of the algorithm also holds in this case [see 
Clement, Lamberton, and Protter (2002)]. Therefore, we use all paths in our 
implementation of Algorithm 3, as this produces similar results. 

4.2. A grid-based approach. We now present a grid-based algorithm for 
determining approximate solutions to the dynamic programming problem. 
This algorithm is based on a portion of the research work in Rambharat 
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(2005). The approach uses the vectors Qt summarizing the filtering distri- 
butions TTt(-) as arguments to the decision and value functions. In contrast 
to the Monte Carlo based approach of Section 4.1 where we directly use a 
summary vector Qt in the LSM method, the grid-based technique requires 
a distribution to approximate ixt- This distribution will typically be param- 
eterized by the summary vector when used to execute the grid-based algo- 
rithm. In our illustrative pricing model, we use a Gaussian distribution to 
approximate the filtering distribution, 7it, at each time point t. Kotecha and 
Djuric (2003) provide some motivation for using Gaussian particle filters, 
namely, Gaussian approximations to the filtering distributions in nonlinear, 
non-Gaussian state-space models. However, any distribution that approxi- 
mates TTt reasonably well could be used for our purposes. Such a choice will 
depend on the model and an empirical assessment of 7T(. 

Since the steps of the sequential Monte Carlo algorithm are designed to 
make the transition from a specified 7Tf to the corresponding distribution 
7Ti_|_i, we can combine a standard Monte Carlo simulation approach with 
the use of Steps 1, 2 and 3 of Algorithm 1 to compute the expectation. 
To be more specific, the next algorithm computes the expectation on the 
right-hand side of equation (20). 

Algorithm 4 works by drawing pairs (st+ijfft+i) f rom f ne conditional dis- 
tribution of (St+i,Qt+i), given St = St, Qt = qt, and using these to compute 
a Monte Carlo estimator of the required conditional expectation. Hence, we 
are able to evaluate ut at various points, given knowledge of t , and will 
thus form a key component of the backward induction step. Note that this 
algorithm also relies on Assumption 3.2 in its use of the summary vector Qt. 

Since we will be interested in storing the functions &£(•,•) and d\ (•, •), we 
next introduce some additional notation. Let 

(24) g = { 9i eR d * +1 ,i = l,2,...,G} 

denote a collection of grid points in where d q denotes the dimen- 

sionality of Qt- These are points at which we will evaluate and store the 
functions ujf and d*. We will typically take 

(25) g = g 1 xg 2 x---xg dq+1 , 

where g± is a grid of possible values for the share price, and gj, j > 1, is a 
grid of possible values for the (j — l)st component of Qt- 

We state our grid-based pricing routine in Algorithm 5. [This is a stan- 
dard gridding approach to solving the dynamic programming equations, as 
described, e.g., in Brockwell and Kadane (2003).] 

Remark. As is the case for the Monte Carlo based approach that we 
describe in this paper, the grid-based scheme stated in Algorithm 5 also 
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Algorithm 4 Estimation of conditional expectations in equation (20) 

Draw values {y\ , i = 1, . . . , m} independently from a distribution chosen 
to be consistent with a parameterization of the summary vector Qt = qt- 

for j = 1, . . . , n do 

• Draw U ~ Unif(l, . . . , m). Then draw s^2i from the conditional distri- 
bution of St+i, given St = St and Y t = y[ . 

• Go through Steps 1 and 2 of Algorithm 1, but in computing weights 
{wt+n i = 1, . . . , m}, replace the actual log-return rt+i by (the simu- 
lated log-return) r^h = log(sH_\/st). 

• Go through Step 3 of Algorithm 1, to obtain . . . , Vt+l}- 

(i) 

• Compute q t +i as the appropriate summary vector, 
end for 

Compute the approximation 

1 n 

(23) Ejw[u$ + i(St+i,Qt+i)\St = s t , Qt = Q.t] — ~ 5^^t+i( s t+i» 9t+i)- 

n j=l 



Algorithm 5 Grid-based summary vector American option pricing algorithm 
Initialization. For each g GG, evaluate 

UT{g,d T ), <#r(flO, and ^(ff), 
using equations (19), (21), and (22). Store the results. 

for t = T- 1,T- 2,...,0 do 
for each g £ Q do 
Evaluate 

u t {g,d t ), d* t (g), and 

using equations (20), (21), and (22). To evaluate the expectations in 
equation (20), use Algorithm 4. Store the results, 
end for 
end for 

Evaluate the option price 

(26) price = Uq(s , q ), 

where sq is an observed initial price and go is an initial (summary) measure 
of the volatility process. 
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gives an option price which assumes that no information about volatility is 
available at time t = 0. In the absence of such information, we just assume 
that the initial log-volatility Yo can be modeled as coming from the limiting 
distribution of the autoregressive process {If} and take qo as the summary 
measure of this distribution (e.g., its mean and variance if the limiting dis- 
tribution is Gaussian). However, in most cases, it is possible to estimate 
log- volatility at time t = using previous observations of the price process 
{St}. In such cases, go would represent the mean and conditional variance of 
Yo, given "previous" observations S-i, . . . , 5_/j for some h > 0. These 
could be obtained in a straightforward manner by making use of the se- 
quential Monte Carlo estimation procedure described in Algorithm 1 (ap- 
propriately modified so that time — h becomes time 0). We could also base 
qo on a historical volatility measure (i.e., the standard deviation of a few 
past observations). 

Remark. As with any quadrature-type approach, grid ranges must be 
chosen with some care. In order to preserve quality of approximations to 
the required expectations in Algorithm 5, it is necessary for the ranges of 
the marginal grids Qj to contain observed values of the respective quantities 
with probability close to one. Once the stochastic volatility model has been 
fit, it is typically relatively easy to determine such ranges. 

Remark. The evaluation of ut(g,dt) in the previous algorithm is per- 
formed making use of the Monte Carlo approximation given by (23). Obvi- 
ously the expression relies on knowledge of Uj +1 (-, •), but since we have only 
evaluated Ut+i at grid points g G Q, it is necessary to interpolate in some 
manner. Strictly speaking, one could simply choose the nearest grid point, 
and rely on sufficient grid density to control error. However, inspection of 
the surface suggests that local linear approximations are more appropriate. 
Therefore, in our implementations, we use linear interpolation between grid 
points. 

4.3. Numerical experiments. The pricing algorithms outlined in Sections 
4.1 and 4.2 demonstrate how to price American-style options in a latent 
stochastic volatility framework. These pricing algorithms are computation- 
ally intensive, however, their value will depend on how accurately they price 
American options in this partial observation setting. We next illustrate the 
applicability of our pricing algorithms through a series of numerical ex- 
periments. We assess the accuracy of our valuation procedure by pricing 
American-style put options using the following methods. (All methods use 
the current share price St in the LSM regression, however, the difference in 
each method is how volatility is measured.) 
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• Method A (basic LSM). This method simulates asset prices St according 
to the model (4)-(6), however, the regression step in the LSM algorithm 
uses a few past observations (St-i, St-2, ■ ■ •) as a measure of volatility 
in lieu of the summary vector Qt. This procedure is most similar to the 
traditional LSM approach of Longstaff and Schwartz (2001). 

• Method B (realized volatility). This procedure simulates asset prices St ac- 
cording to the model (4)-(6), however, for each time point k (k = 1, . . . , N) 
and for each path / (7 = 1,...,M), we compute a measure of realized 
volatility 



where Rjj is the return at time j along path I. The LSM regression step 
then proceeds to use the realized volatility measure RVk . at each time 
point k as a measure of volatility. 

• Method C (MC/Grid). This method uses the algorithms described in Sec- 
tions 4.1 and 4.2 to price American-style options in a latent stochastic 
volatility framework either using (i) a pure simulation-based Monte Carlo 
(MC) approach or (ii) a Grid-based approach. Along with the simulated 
share prices St, this approach makes extensive use of the summary vectors 
Qt that capture key features of the volatility filtering distribution lit ■ 

• Method D (observable volatility). This approach simulates the asset prices 
St and the volatility at, however, it assumes that both asset price and 
volatility are observable. This is the full observation case that we will use 
as a benchmark. Whichever of methods A, B or C is closest to method D 
will be deemed the most accurate. 

Figure 1 presents an illustrative example of the difference in American put 
option prices between method A and method D for several types of option 
contracts. One could think of this illustration as reporting the difference 
in pricing results for two extremes: the minimum observation case (method 
A or basic LSM) and the full observation case (method D or observable 
volatility). This figure uses parameter settings where stochastic volatility 
is prevalent. The differences in option prices indeed show that stochastic 
volatility matters when computing American option prices, especially when 
volatility of volatility is high [i.e., when 7 in equation (3) is large]. 

In order to demonstrate the value of our proposed approach, method C 
(MC/Grid), we must illustrate that it produces more accurate American 
option pricing results than either of the simpler (and faster) methods (A 
and B). We experimented with various model and option parameter set- 
tings and found situations where simpler methods work just as well as our 
approach. However, we also found model/option parameter settings where 
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Example 1 Example 2 




0.3 0.4 0.5 0.6 0.7 0.3 0.4 0.5 0.6 0.7 

inital vol. (S0=23) inital vol. (S0=25) 



Fig. 1. A comparison of American put option pricing results between methods A (min- 
imum observation case, dashed line) to method D (full observation case, solid line) for 
various parameter settings. 
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Table 1 

Description of the stochastic volatility and American option pricing inputs to the 
numerical experiments comparing methods A, B, C and D. We set the number of 
particles m = 1000 for method C and report Macintosh OS X (V. 10.4-11) compute times 

for all cases 



Experiment no. Parameters (p, a, (3, 7, A) Option inputs (K, T, r, So, (To) 



1 


(-0.055, 3.30, log(0.55), 0.50, 


-0.10) 


(23, 10, 0.055, 20, 0.50) 


2 


(-0.035, 0.25, log(0.20), 2.10, 


-1.0) 


(17, 20, 0.0255, 15, 0.35) 


3 


(-0.09, 0.95, log(0.25), 3.95, - 


-0.025) 


(16, 14, 0.0325, 15, 0.30) 


4 


(-0.01, 0.020, log(0.25), 2.95, - 


-0.0215) 


(27, 50, 0.03, 25, 0.50) 


5 


(-0.03, 0.015, log(0.35), 3.00, 


-0.02) 


(100, 50, 0.0225, 90, 0.35) 


6 


(-0.017, 0.0195, log(0.70), 2.50, 


-0.0155) 


(95, 55, 0.0325, 85, 0.75) 


7 


(-0.075, 0.015, log(0.75), 6.25, 0.0) 


(16, 17, 0.0325, 15, 0.35) 


8 


(-0.025, 0.035, log(0.15), 5.075, 


-0.015) 


(18, 15, 0.055, 20, 0.20) 


9 


(-0.05, 0.025, log(0.25), 4.50, - 


-0.015) 


(19, 25, 0.025, 17, 0.35) 


our 


approach outperforms the other pricing 


; methods. Table 1 describes the 



settings of our numerical experiments. This table outlines selected values of 
the model parameters and the American option pricing inputs for use in the 
pricing algorithms (K is the strike price, T is the expiration in days, r is the 
interest rate, So is the initial share price, and oq is a fixed initial volatility). 

Table 2 reports the American option pricing results (and standard er- 
rors) for methods A (basic LSM) and B (realized volatility) and Table 3 
reports the pricing results (and standard errors) for methods C (MC/Grid) 
and D (observable volatility). The computation times for all methods are 
also reported. For methods A, B and D, we used M = 15,000 LSM paths in 
the pricing exercise. When running method C, we observed negligible differ- 
ences between the MC-based and grid-based approaches. Hence, we report 
the pricing results (and standard errors) for method C using Algorithms 2 
and 3 with M = 15,000 LSM paths. (The MC-based approach also permits 
comparisons to the other approaches in terms of standard errors.) 

There are some notable observations to be made from the results of the 
numerical experiments in Tables 2 and 3. First, when the effect of volatility is 
weak/moderate and the effect of mean reversion is moderate/strong (experi- 
ments 1 and 2), all methods result in similar American option pricing results. 
There are cases in the literature that discuss fast mean reversion [see, e.g., 
Fouque, Papanicolaou, and Sircar (2000)], and in these cases it would cer- 
tainly make sense to use a faster pricing method. However, in the situations 
where we experiment with dominant volatility effects (experiments 3 through 
9), although not often encountered in the empirical stochastic volatility lit- 
erature, but pertinent to volatile market scenarios, method C comes closest 
to method D. One should note that in cases of dominant volatility method B 
does better than method A, as it uses a more accurate measure of volatility. 
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Method C, however, which actually makes use of the filtering distributions 
7Tt, comes within standard error of method D (observable volatility case). 
Additionally, we observed when stochastic volatility is dominant and the 
American option has a long maturity and is at /in-the-money, the difference 
in pricing results is more pronounced. 

Method C (MC/Grid) is a computationally intensive approach, although 
this feature is shared by many Monte Carlo and grid-based techniques. 
Clearly, the proposed pricing algorithm using method C is not competitive 
in terms of computation time. The simpler methods (A and B) are much 
faster and also accurate under strong mean reversion and weak stochastic 
volatility. It should be observed, however, that the pricing accuracy is much 
higher using method C, as it is always within standard error of method D 
(observable volatility). The accuracy of method C holds in all model/option 
parameter settings. One does not require special cases (high mean rever- 
sion, low volatility) or special option contract features (long/short maturity, 
in/at /out-of-the money) for our approach to be competitive in terms of ac- 
curacy. We also ascertain the robustness of our approach by computing (and 
storing) the exercise rule from each of the four methods (A through D). We 
use the rule to revalue the American put options on an independent, com- 
mon set of paths. The results are described in Tables 4 and 5; note that the 

Table 2 

American put option pricing results (and compute times) using methods 
A (basic LSM with past share prices) and B (realized volatility as an 
estimate for volatility) 



Experiment no. 


A (basic LSM) 


B (realized volatility) 


1 




3.044 (0.00985) 


3.038 (0.00999) 


Time 


(sec) 


18 


12 


2 




2.147 (0.00947) 


2.154 (0.00975) 


Time 


(sec) 


15 


16 


3 




1.212 (0.00759) 


1.231 (0.00842) 


Time 


(sec) 


16 


14 


4 




3.569 (0.0242) 


4.153 (0.0358) 


Time 


(sec) 


37 


39 


5 




12.994 (0.0743) 


15.067 (0.120) 


Time 


(sec) 


48 


39 


6 




18.623 (0.121) 


21.476 (0.168) 


Time 


(sec) 


57 


41 


7 




1.588 (0.0120) 


1.843 (0.0186) 


Time 


(sec) 


18 


14 


8 




0.0945 (0.00367) 


0.145 (0.00553) 


Time 


(sec) 


11 


15 


9 




2.437 (0.0132) 


2.667 (0.0197) 


Time 


(sec) 


21 


21 
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Table 3 

American put option pricing results ( and compute 
times) using methods C (MC/Grid) and D (observable 
volatility) 



Experiment no. 


C (MC/Grid) 


D (observable volatility) 


1 




3.051 (0.0108) 


3.046 (0.00981) 


Time 


(sec) 


162 


15 


2 




2.160 (0.00961) 


2.173 (0.00990) 


Time 


(sec) 


310 


15 


3 




1.257 (0.00901) 


1.260 (0.00901) 


Time 


(sec) 


218 


12 


4 




4.688 (0.0439) 


4.734 (0.0441) 


Time 


(sec) 


760 


37 


5 




16.208 (0.138) 


16.331 (0.138) 


Time 


(sec) 


815 


39 


6 




22.758 (0.184) 


22.996 (0.185) 


Time 


(sec) 


900 


46 


7 




1.994 (0.0216) 


2.045 (0.0225) 


Time 


(sec) 


266 


17 


8 




0.168 (0.00691) 


0.172 (0.00704) 


Time 


(sec) 


232 


15 


9 




2.850 (0.0233) 


2.861 (0.0235) 


Time 


(sec) 


416 


20 



respective pricing results are similar to those stated in Tables 2 and 3, hence 
leading to similar conclusions. 

The differences in pricing results, as noted in Tables 2 and 3 (or Tables 4 
and 5) , are relevant when thinking about how some option transactions take 
place in practice. For example, on the American Stock Exchange (AMEX), 
American-style options have a minimum trade size of one contract, with 
each contract representing 100 shares of an underlying equity. 4 Hence, the 
discrepancies between methods A and B and our proposed method C could 
be magnified under such trading scenarios. Our proposed approach would 
be especially useful when constructing risk management strategies during 
volatile periods in the market. 

We also repeated the numerical experiments in Table 1 using a first- 
order Euler discretization of the model as opposed to exact simulation. The 
Euler-based results are reported in Tables 12 and 13 of Section A. 4. Upon 
inspection, one observes that the corresponding results for each of methods 
A through D are virtually identical. Thus, if a model does not permit exact 
simulation, a numerical procedure such as the first-order Euler scheme or 



4 Source: http://www.amex.com. 
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any other scheme [see, e.g., Kloeden and Platen (2000)] should suffice for 
the purposes of executing our pricing algorithm. 

We now illustrate a statistical application of our proposed pricing method- 
ology. We first demonstrate how to estimate model parameters. Next, we 
make inference, using observed American put option prices, on the mar- 
ket price of volatility risk. Since method C outperforms either A or B in 
all model/option settings, we will use it as our primary tool for statistical 
analysis. 

5. Statistical estimation methodology. 

5.1. Model parameter estimation. The stochastic volatility model, out- 
lined in equations (1), (2) and (3), has been analyzed extensively in previ- 
ous work, namely, Jacquier, Poison, and Rossi (1994, 2004) and Yu (2005). 
These authors analyze the model under the statistical (or "real- world" ) mea- 
sure as described in a Section 2 footnote. Model estimation of share prices 
under the real- world measure would only require data on price history. Esti- 
mation of model parameters in a risk neutral setting, however, is a bit more 
involved as we need both share price data on the equity as well as option 

Table 4 

American put option pricing results (and compute times) 
using the exercise rule from methods A and B on an 
independent, common set of paths. The results are 
comparable to those given in Table 2 



Experiment no. 


A (basic LSM) 


B (realized volatility) 


1 




3.045 (0.00993) 


3.050 (0.0101) 


Time 


(sec) 


31 


16 


2 




2.128 (0.00943) 


2.141 (0.00991) 


Time 


(sec) 


23 


17 


3 




1.213 (0.00753) 


1.239 (0.00856) 


Time 


(sec) 


21 


20 


4 




3.540 (0.0242) 


4.162 (0.0365) 


Time 


(sec) 


59 


57 


5 




13.043 (0.0736) 


15.035 (0.120) 


Time 


(sec) 


72 


58 


6 




18.225 (0.122) 


20.934 (0.168) 


Time 


(sec) 


65 


70 


7 




1.568 (0.0115) 


1.809 (0.0180) 


Time 


(sec) 


25 


23 


8 




0.106 (0.00428) 


0.156 (0.00590) 


Time 


(sec) 


17 


20 


9 




2.434 (0.0131) 


2.669 (0.0198) 


Time 


(sec) 


30 


32 
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Table 5 

American put option pricing results (and compute times) 
using the exercise rule from methods C and D on an 
independent, common set of paths. The results are 
comparable to those given in Table 3 



Experiment no. 


C (MC/Grid) 


D (observable volatility) 


1 


3.050 (0.0107) 


3.050 (0.00977) 


Time (sec) 


193 


20 


2 


2.145 (0.00966) 


2.151 (0.00990) 


Time (sec) 


302 


23 


3 


1.271 (0.00913) 


1.276 (0.00916) 


Time (sec) 


253 


16 


4 


4.735 (0.0441) 


4.813 (0.0448) 


Time (sec) 


747 


62 


5 


16.092 (0.137) 


16.185 (0.139) 


Time (sec) 


740 


63 


6 


22.380 (0.184) 


22.646 (0.186) 


Time (sec) 


885 


80 


7 


1.965 (0.0213) 


2.003 (0.0220) 


Time (sec) 


274 


21 


8 


0.186 (0.00747) 


0.196 (0.00785) 


Time (sec) 


228 


18 


9 


2.868 (0.0236) 


2.894 (0.0238) 


Time (sec) 


378 


30 



data. Pan (2002) illustrates how to use both share and option price data to 
jointly estimate parameters under the real-world measure and risk-neutral 
measure. Eraker (2004) also implements joint estimation methodology for 
share and option price data. However, both Pan (2002) and Eraker (2004) 
only deal with the case of European-style options. 

We propose a two-step procedure to estimate our illustrative stochastic 
volatility model in an American-style derivative pricing framework. In the 
first step, we use share price data to estimate the statistical model parame- 
ters. Define the parameter vector 5 

9 = (p,a,/3,7). 

Conditional on estimated model parameter values, we then estimate the 
market price of volatility risk parameter A. Estimation of A requires data on 
both share and American option prices. Although it would be comprehensive 
to do a joint statistical analysis of both share and option prices, this problem 



Strictly speaking, estimation using only share prices (i.e., under the physical measure) 
involves the physical drift rate in the parameter vector. However, since we do not use the 
physical drift rate in risk-neutral pricing, we do not present its estimation results. 
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is quite formidable in the American option valuation setting. (The full joint 
estimation problem is left for future analysis.) We adopt a Bayesian approach 
to parameter estimation. The first objective is to estimate the posterior 
distribution of 9, 

(28) P(9\r u ...,r n )- ^ . . . ,r n \9) ■ P {9) 



fp(r r n \8)-p(6)d6- 

Since p E [— 1, 1], a > 0, (3 G R, and 7 > 0, we reparameterize a sub-component 
of the vector 9 so that each component will have its domain in R. This repa- 
rameterization facilitates exploration of the parameter space. Let us define 

/5 = tan (^")' a = log(a), and 7 = log (7). 

We assign independent standard normal priors to the components of the 
(reparameterized) vector 

9 = [p,a,(3,% 

The next step is the evaluation of the likelihood (or log-likelihood) for 9, 
however, an analytical expression for the likelihood is not available in closed- 
form. We employ Kitagawa's algorithm [see, e.g., Kitagawa (1987, 1996) 
and Kitagawa and Sato (2001) and the references therein] to estimate the 
log-likelihood of our model for share prices under the real-world measure. 
Kitagawa's algorithm, used to compute the log-likelihood for nonlinear, non- 
Gaussian state-space models, employs the fundamental principles of particle- 
filtering. The essence of Kitagawa's approach uses the weights described in 
equation (18) of Algorithm 1 to provide a Monte Carlo based approxima- 
tion to the log-likelihood. The details of this log-likelihood approximation are 
available in Kitagawa and Sato (2001). We provide Kitagawa's log-likelihood 
estimation approach for nonlinear, non-Gaussian state-space models in Al- 
gorithm 6. 

Remark. The approximation to the log- likelihood value associated with 
a particular parameter value in equation (29) of Algorithm 6 becomes more 
accurate as the number of particles m gets large. (We used m = 500 in our 
estimation exercise.) When resampling using the weights in equation (18), 
we sometimes work with the shifted log-weights as this leads to improved 
sampling efficiency. 

Once we obtain an estimate of the log-likelihood for the model parameters, 
we then combine it with the (log) priors and use a random-walk Metropolis 
algorithm to estimate the (log) posterior distribution of 9 (or, equivalently, 
9). That is, we estimate the posterior distribution for 9 and then transform 
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Algorithm 6 Kitagawa (1987) log- likelihood approximation 

Initialization 1. Input a proposed value of the parameter vector 9 for 
which the log-likelihood value is required. 

Initialization 2. Choose a number of "particles" m > 0. Draw a sample 
{y^\ ■ ■ ■ iVq"^} from the distribution of Y$ [see equation (8)]. 

Step 1. Cycle through the (i) forward simulation, (ii) weighting, and 
(hi) resampling steps of Algorithm 1 ensuring that the weights, w[ , 
% = 1, . . . , m, from equation (18) are stored for t = 1, . . . ,T. 

Step 2. Approximate the log-likelihood, 1(9), by 



back to the original scale to calculate the posterior distribution of 9 as 
stated in equation (28). Our Markov chain Monte Carlo (MCMC) algorithm 
utilizes a Gaussian proposal density in order to facilitate the estimation 
of the posterior distribution in the 9 parameter space. The details of our 
MCMC sampler are described in Algorithm 7, which is found in Section A. 2. 

Given the estimates of the model parameters under the real- world measure 
using the share prices, we now describe how to use both share and option 
price data to estimate the market price of volatility risk. We work with a 
summary measure of p(9\r±, . . . ,r n ), namely, the posterior mean, although 
one could use another measure such as the posterior median. The estimation 
of the market price of volatility risk will be computed conditional on this 
posterior summary measure. We implement this estimation exercise using 
the algorithms outlined in Section 4. 

5.2. Volatility risk estimation. The estimation of the market price of 
volatility risk, A, in equation (3) presents a computational challenge, partic- 
ularly in the American option valuation framework. We aim to use the al- 
gorithms described in this paper to propose methodology that will facilitate 
statistical inference of A in the American option pricing setting. To the best 
of our knowledge, this is the first analysis to compute and make posterior 
inference on the volatility risk premium for American-style (early-exercise) 
options. In many applications of option-pricing in a stochastic volatility 
framework, it is often the case that A is set to a prespecified value [see, e.g., 
Heston (1993), Hull and White (1987) and Pastorello, Renault, and Touzi 
(2000)]. One convenient approach is to set A = 0. This is known as the "min- 
imal martingale measure" in some strands of the option-pricing literature 
[Musiela and Rutkowski (1998)]. 



(29) 
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Both Pan (2002) and Eraker (2004) estimate stochastic volatility model 
parameters, including the market price of volatility risk, for the case of Euro- 
pean options. However, the early-exercise feature of American-style options 
adds further complexities to the estimation problem. One method to esti- 
mate the market price of volatility risk for American-style options would be 
to set up the following nonlinear regression model [similar in spirit to what 
Eraker (2004) does in his analysis of European options]. Let 

(30) Ui = Pt.(\) + ei, 

where Ui (i = 1, . . . ,L) is the observed American option price, Pq*(\) is the 
model predicted American option price conditional on the mean, 9* , of the 
posterior distribution in equation (28), and A is the market price of volatility 
risk. 6 Pq, (A) is computed using one of our proposed pricing algorithms in 
Section 4. The error term, £j, is assumed to be an independent sequence of 
iV(0,cr 2 ) random variables. 

The next step is to find the optimal value for A which we will denote by 
A*. One approach would be to minimize the sum-of-squared errors, S(X), 
where 

L 

(31) S(\) = Y,(Ui-P^Wf and 

1=1 

(32) A* = argminS^A). 

A 

As noted in Seber and Wild (2003), the minimum value of S(X) corresponds 
to the least-squares estimate of the nonlinear regression model in equation 
(30). One can also show that the least-squares estimate is equivalent to 
the maximum likelihood estimate (MLE). Optimizing 5(A), although com- 
putationally demanding, is feasible. We again adopt a Bayesian approach, 
outlined more generally in Seber and Wild (2003), to solve this optimiza- 
tion problem. First, we start with the (Gaussian) likelihood for the model 
in equation (30) which is given by 

p(«i, . . . ,u L \X,a 2 ) = (2vr C T 2 )- i / 2 ^exp|-^(^ - P^(A)) 2 | 

(33) 

= (2vra 2 )- i / 2 exp|-^5(A)|, 

where the second equality in the likelihood formulation follows from equa- 
tion (31). As suggested in Seber and Wild (2003), if we use the following 



6 Although it is not explicitly stated in equation (30), both 6* and Pg*(A) depend on 
the share price data St or, equivalently, the returns data Rt. 



SEQUENTIAL MONTE CARLO PRICING OF AMERICAN OPTIONS 



31 



(improper) prior distribution over (A, a 2 ), 

(34) p(X,a 2 )cx^, 

it follows that the posterior distribution for (A, a 2 ) is, up to a constant of 
proportionality, 7 

(35) P (\,a 2 \ Ul , . . . ,u L ) oc (a 2 )- (L/2+1) expj-^}. 

Recognizing the kernel of the inverse gamma distribution for a 2 in equation 

(35) , namely, IG(|-, — t^), we can integrate out a 2 to conclude that 

(36) P(A|«1, ■ ■ • OC ^ ( 5 ( A ))~ L/2 - 

Therefore, we have shown in equation (36) that, with the choice of prior in 
equation (34), the posterior distribution of the market price of volatility risk 
A is proportional to S(\)~ L / 2 . If we maximize this posterior distribution, it 
is equivalent to minimizing S(X), and hence, the result would be the same 
as the least-squares estimate or the MLE. 

One approach to approximating the posterior distribution in equation (36) 
is to use an MCMC based procedure. However, in the context of American 
option valuation, the early-exercise feature would present a major computa- 
tional challenge when evaluating S(X). A more feasible approach is to use a 
result that concerns the approximate normality of the posterior distribution 
close to the posterior mode [see Chapter 2 of Seber and Wild (2003) for a 
discussion in the context of nonlinear regression]. If we denote the posterior 
mode of equation (36) by A*, then under suitable regularity conditions [see 
Chapter 7 of Schervish (1995)], the posterior distribution of A near A* can 
be approximated as a normal distribution with mean A* and variance V* . 
In particular, 

(37) \\ Ul ,...,u L ~N(\*,V*), 
where 



£ 3g ) _}__ d 2 log(p(\\ Ul ,...,u L )) 



V* dX 2 



d 2 log[(5(A))- L / 2 ] 



dA 2 



A=A* 

Algorithm 8 in Section A. 3 of the Appendix illustrates how to estimate 
the parameters of the normal distribution in equation (37). We use a grid- 
search to find the posterior mode, A*, and then we estimate the deriva- 
tive expression in equation (38) using numerical approximation techniques 



7 For simplicity, we will suppress dependence on R t and 6* in the calculation for the 
posterior distribution of A. 
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(namely, central differences) described in, for instance, Wilmott, Howison, 
and Dewynne (1995). We next report the results of a data-analytic study 
of our American option valuation approach using the aforementioned algo- 
rithms on three equities. 

6. Empirical analysis. Our empirical analysis uses the algorithms out- 
lined in Section 4 together with historical share prices and American put 
option data. A reference to our computing code and data sets is given in 
Rambharat and Brockwell (2009). 

6.1. Data description. We obtain observed market data on equity prices 
as well as American-style put options on these underlying equities. We gather 
share price data on three equities: Dell Inc., The Walt Disney Company and 
Xerox Corporation. The share price data are sourced from the Wharton Re- 
search Data Services (WRDS). 8 The share price data represent two periods: 
(i) a historical period from Jan. 2nd, 2002 to Dec. 31st, 2003, and (ii) a 
valuation period from Jan. 2nd, 2004 to Jan. 30th, 2004 (the first 20 trading 
days of 2004) . The historical share price data will be used for model param- 
eter estimation and the trading share price data will be used in the option 
valuation analysis. The option price data sets are obtained for the period 
spanning the first 20 trading days of 2004. The data on American put options 
are extracted from the website of the American Stock Exchange (AMEX). 
We also use the LIBOR rates from Jan. 2004 (1-month and 3-month rates 
were around 0.011, and 6-month rates were around 0.012), obtained from 
Bloomberg®, for the value of the risk-free rate r. A plot of the share prices 
of the three equities over the historical period appears in Figure 2. Addi- 
tionally, Tables 6 and 7 summarize some features of the share and option 
price data; note that most options are at-the-money and their maturities 
range from short to long. 

Figure 3 depicts m along with Gaussian approximations using the output 
of Algorithm 1 for the three equities in our analysis. We choose two time 
points for each equity for our graphical illustrations, however, it should 
be noted that results are similar for other time points. We construct the 
summary vectors Qt based on these distributions. The grid-based Algorithms 
4 and 5 would, for instance, use a Gaussian distribution to approximate these 
filtering distributions, as this appears to provide an adequate fit. Note that 
for other choices of stochastic volatility models, a different (possibly non- 
Gaussian) distribution may be suitable as an approximation to the filtering 
distribution. 



8 Access to the WRDS database was granted through Professor Duane Seppi and the 
Tepper School of Business at Carnegie Mellon University. 
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Dell (2002-2003) Disney (2002-2003) Xerox (2002-2003) 




Fig. 2. A time series plot of the share prices of Dell, Disney and Xerox over the period 
Jan. 2nd, 2002 to Dec. 31st, 2003. 

6.2. Posterior summaries. We report the posterior means and 95% cred- 
ible intervals for the parameter vector 9 = (p, 7) for each of the three 
equities in Table 8. These are the results from execution of Algorithm 7. In- 
spection of a and 7 shows that the volatility process for all equities exhibit 



Table 6 

Description of the equity share prices: the share price range (in 
dollars) and the share price mean (and standard deviation) for 
the historical period Jan. 2nd, 2002 to Dec. 31st, 2003 
(estimation period) 



Equity 


Historical range 


Historical mean (sd) 


Dell 


22.33-36.98 


28.95 (3.57) 


Walt Disney 


13.77-25.00 


19.83 (2.87) 


Xerox 


4.30-13.80 


9.18 (1.82) 



Table 7 

Description of the American put options: the number of options in our data set (L), the 
maturity (in days), the strike price (in dollars), the share price range, and the share 
price mean (and standard deviation) for the Jan. 2004 valuation period 



Equity 


L 


Maturity 


Strike 


Share price range 


Share price mean (sd) 


Dell 


120 


15-98 


32.50-37.50 


33.44-35.97 


34.89 (0.65) 


Walt Disney 


60 


15-135 


25.00 


23.67-24.96 


24.45 (0.40) 


Xerox 


60 


15-135 


14.00 


13.39-15.15 


13.99 (0.48) 
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Dell (Dec. 30th, 2002) Dell (May 14th, 2003) 




-1.5 -1.0 -0.5 0.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 

Particles (bw=0.0258| Particles (bw=0.0266) 

Disney (Jul. 10th, 2002) Disney (Jun. 10th, 2003) 




-1.5 -1.0 -0.5 0.0 0.5 1.0 -2.0 -1.5 -1.0 -0.5 0.0 0.5 

Particles (bw=0.0400) Particles (bw=0.0480) 



FlG. 3. Examples of the sequential Monte Carlo filtering distributions, 7Y t , as defined in 
Section 3.2 for each of the three equities at a few selected dates in the estimation period 
2002-2003. The solid lines are kernel- smoothed sequential Monte Carlo estimates of n t 
and the dashed lines are the Gaussian approximations. 
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Table 8 

Bayesian posterior means and 95% credible intervals ( CI) for the parameters of the 
stochastic volatility model in equations (l)-(3) by equity 



Equity 


P 


a 


& 


7 


Dell 


-0.673 


1.830 


-1.087 


1.081 


95% CI 


(-0.767, -0.402) 


(0.267, 5.398) 


(-2.157, -0.316) 


(0.674, 1.695) 


Disney 


-0.612 


0.363 


-1.379 


0.686 


95% CI 


(-0.761, -0.259) 


(0.0194, 1.805) 


(-2.970, -0.288) 


(0.426, 1.080) 


Xerox 


0.198 


26.726 


-0.812 


3.494 


95% CI 


(-0.0215, 0.419) 


(8.036, 54.328) 


(-1.030, -0.603) 


(2.119, 5.214) 



noticeable signs of mean reversion and stochastic volatility, respectively. The 
results for j3, the overall level of the volatility process around which its mean 
reverts, are also reported. Observe, as well, that the results for both Dell 
and Disney show strong signs of the leverage effect between share prices and 
their volatility. This is evidenced by the negative values for p and the fact 
that the 95% credible intervals do not span zero. On the other hand, the 
results for Xerox are not as conclusive, as the 95% credible interval for p 
spans zero. 

Conditional on the posterior means reported in Table 8, we next estimate 
the posterior distribution of the market price of volatility risk parameter, 
A, for each equity. This is facilitated by the implementation of Algorithm 8. 
Essentially, for each equity, we find the posterior mode [i.e., the maximum 
value of the expression in equation (36)] and then we implement the analysis 
described in Section 5.2. Our results from the volatility risk estimation are 
reported in Table 9. We explain the full details of the numerical computa- 
tions in Section A. 3. 

Based on the posterior analysis of A, all three equities show evidence of 
a negative value for this parameter. Observe that the (approximate) 95% 
credible intervals are negative and do not span zero. This is consistent with 
results reported in the literature on the market price of volatility risk [see 
e.g., Bakshi and Kapadia (2003a, 2003b)]. In these studies, it is explained 

Table 9 

Parameters of the normal approximation to the posterior 
distribution of X, the market price of volatility risk, as well 
95% credible intervals 



Equity 


A* 


v* 


95% credible interval 


Dell 


-6.350 


0.00266 


(-6.451,-6.249) 


Disney 


-10.850 


0.00750 


(-11.020,-10.680) 


Xerox 


-0.700 


0.00815 


(-0.877,-0.523) 
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Table 10 

Comparison of the model fit to American put option data using 
A = A* and A = in terms of the mean-squared errors, S(X), 
between model-predicted option prices and market observed 
option prices 



Equity 


5(A = A*) 


S(\ = 0) 


Dell 


0.1839 


0.7580 


Disney 


0.3631 


0.9620 


Xerox 


0.02016 


0.02739 



that the negative volatility risk premium signals that investors are willing to 
pay a premium for "downside protection" (or adverse movements in share 
prices due to stochastic volatility). This results because a negative value 
for A implies a higher volatility mean reversion level [see equation (3)] and, 
therefore, most likely a higher volatility and option price. It is especially 
during these adverse movements (or volatile market activity) that our pricing 
methodology for American-style options is most pertinent. Furthermore, the 
magnitude of our results for the market price of volatility risk in this small 
empirical analysis are in agreement with studies that analyze a larger set of 
individual equities as well as index options [Bakshi and Kapadia (2003b)]. 
However, these earlier strands of empirical research do not analyze options 
with early-exercise features. 

Additionally, in Table 10, we report the sum-of-squared errors, <S(A), when 
A equals A* and when A = 0. Clearly, the model-predicted American option 
prices better match market data for the optimized (nonzero) A value. This 
casts some evidence in favor of a nonzero volatility risk premium. It is also 
interesting to note that the market price of volatility risk parameter A does 
not appear to be the same across all equities. Thus, if one had a portfolio of 
equities, it may be interesting to understand the differences in their volatil- 
ity risk premiums. One potential reason for this difference across equities 
is that the market price of volatility risk may be comprised of two compo- 
nents: (i) a market component, which we may expect to be constant across 
equities, and (ii) an idiosyncratic component, which may well be the funda- 
mental source of the differences in the estimates of A across equities (Nikunj 
Kapadia, personal communication). Further analyses call for more elaborate 
specifications of A and additional study of its underlying components. We 
could, for instance, model A as a time-varying function or even a stochastic 
process. 

7. Discussion. We introduce an algorithm for pricing American-style op- 
tions under stochastic volatility models, where volatility is assumed to be a 
latent process. Our illustrative model takes into account the co-dependence 
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between share prices and their volatility as well as the market price of volatil- 
ity risk. The approach is based on (i) the empirical observation that the 
conditional filtering distributions 7Tj can be well approximated by summary 
vectors Qt or parametric families of distributions that capture their key fea- 
tures, (ii) the use of a sequential Monte Carlo step to obtain and update 
the distributions 7Tj, and (iii) a gridding (quadrature) type approach and a 
Monte Carlo simulation-based adaptation to solve the associated dynamic 
programming problem. Our methodology is not tied to a specific stochas- 
tic volatility model or simulation procedure but could accommodate a wide 
range of stochastic volatility models and/or numerical simulation methods. 

We document, through numerical experiments, that our method uses fea- 
tures of nt to better price American options more accurately than simpler 
methods. In fact, our approach comes within standard error of the Ameri- 
can option price when volatility is assumed to be observed. One drawback 
with the methodology that we introduce is its computational demand. Addi- 
tionally, there are special situations (high mean reversion and low volatility 
of volatility) where simpler methods may suffice for pricing American-style 
options. However, our approach leads to a more optimal exercise rule for all 
model/option parameter settings. Our approach can also be practically im- 
plemented using sophisticated parallel computing resources. The proposed 
valuation method for pricing American-style options is especially useful for 
important financial decisions in a very volatile market period. 

Using observed market data on share prices for three equities (Dell, Disney 
and Xerox), we implement a Bayesian inferential procedure to estimate (i) 
share price model parameters, and (ii) the market price of volatility risk (or 
the volatility risk premium). Our results are consistent with findings in the 
literature, namely, leverage effects between share prices and their volatility 
and a negative volatility risk premium. Leverage effects are significant for all 
equities with the exception of Xerox. The volatility risk premium (measured 
by A) is also significantly negative since its credible interval does not span 
zero for any of the three equities. This ultimately implies that volatility risk 
is priced in the market and investors are willing to pay a premium for adverse 
movements in share prices due to volatility. Furthermore, we approximate 
the posterior distribution of A near its optimal value with a Gaussian dis- 
tribution. Consequently, we are able to make statistical inference about the 
volatility risk premium for early-exercise options. 

A potential refinement of our estimation procedure would be to implement 
a joint time series analysis of the share and option prices. This analysis can 
be facilitated by the algorithms in this paper, however, parallel computing 
power would be of tremendous assistance in this regard. Additionally, jumps 
in the statistical models could also be incorporated and our approach could 
be used to make inference on jump parameters and the jump risk premium. 
An additional line of future work would be to use the inference made on the 
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volatility risk premium of American-style options to construct profitable 
trading/hedging strategies that are pertinent to risk management settings. 

APPENDIX 

A.l. Proof of Lemma 3.1. We use an inductive argument. To begin with, 
ut(sq, ■ ■ ■ , st, dx) is by its definition (10) obviously a function of st and d?, 
and thus is trivially a functional of st,ttt and dx, which we can denote by 
u T {sT,^T,d T ). 

Next, suppose that for some t, we can write Ut+i(so, . . . , st+i, dt+i) = 
u t+ i(st+i,iTt+i,dt + i). Then from (12), 

(39) u t (s ,... : s t ,E)=g(s t ), 
and 

u t (s Q , ...,s t ,H) 

= £rnK* +1 (s , • • • ,st,S t+ i)\S = s Q ,...,S t = s t ) 

(40) = E RN (ul +1 (S t+ i,TT t+ i)\So = so,...,S t = s t ) 

(41) = / EMut +1 (S t+1 ,n t+1 )\S = s ,...,S t = s t , Y t = y t)M y t ) dt 

(42) = j E w (u* t+1 {St + i,-Kt+i)\St = s t ,Y t = y t )TTt{yt)dt. 

Equation (41) is obtained from (40) using a simple conditioning argument, 
and (42) then follows since {(St,Yt), t = 0, 1, ...} is a (bivariate) Markov 
process. The expression in (39) is obviously a function of st, and since st 
and nt completely determine the distribution of the arguments St+i and 
7T( + i to the function u^ +1 (-,-) in (42), it is also clear that the expression 
in (42) is a functional of st and 7rj. Thus, ut(so, . . . ,st, dt) is a functional of 
st, TTt and dt, which we denote by ut(st,irt,dt). 

Invoking this inductive step for t = T— 1, T — 2, . . . , gives the first part 
of the desired result. The second part of the result follows directly from the 
first part, along with the definitions (11) and (13). 



A.2. Estimation algorithms. The MCMC algorithm that we use to esti- 
mate the posterior distribution p(6\r\, . . . ,r n ) in equation (28) is described 
below in Algorithm 7. We implement a random-walk Metropolis-Hastings 
(MH) algorithm to arrive at our estimate of the posterior distribution of 9. 
Additionally, Algorithm 8 describes the procedure used to optimize the (ap- 
proximate) posterior distribution of the volatility risk premium p{ A \u\, . . . , ui) 
in equation (36). 
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Algorithm 7 Markov chain Monte Carlo (MCMC) posterior simulation 
Initialization 1. Input the parameters of the prior and proposal distribu- 
tions corresponding to the 9 parameterization. 

Initialization 2. Set the starting value of 9 at the prior mean (or any other 
reasonable value) and denote this by 9 C to represent the current value. Use 
Algorithm 6 and the log-prior densities to compute a log-posterior value 
of 9 C and denote this by LPost c . 

for i = 1, . . . , B do 

• Sampling. Draw the ith potential value of 9 using a multi-variate nor- 
mal proposal density and denote this by 9 S . 

• Posterior evaluation. Use Algorithm 6 along with the log-prior den- 
sities to compute the ith log-posterior value of 9 S and denote this by 
LPost s . 

• MH-step A. Sample U { ~ Unif [0, 1]. 

• MH-step B. If log(t/j) < (LPost s - LPost c ), update 9 C = 9 S and 
LPost c = LPost s . Else do not update 9 C and LPost c . 

end for 

Output. Perform the relevant inverse transformation of 9 in order to return 
the posterior distribution of 9. 



Remark. We implement in Algorithm 7 a stochastic search over the 
transformed parameter space 9 as defined in Section 5.1. We use a multi- 
variate Gaussian proposal density with mean equal to the current point and 
a diagonal variance-covariance (VCOV) matrix. The elements of the VCOV 
matrix that proposed values for p, a, f3 and 7 are, respectively, 

0.001, 0.005, 0.0025 and 0.001. 

(We experimented with different parameterizations of the proposal density 
and did not find appreciable differences in the results.) 

Remark. Regarding the statistical estimation of the model parameters 
via MCMC, we initialize the physical drift rate using the average of the re- 
turns data and we set the correlation parameter p = for each case. We ini- 
tialize the parameters of the stochastic volatility process (a,/3,7) as follows: 
Dell (8.20,-1.00,1.50), Disney (4.40,-1.20,1.10) and Xerox (17.0,-0.800,3.00). 
These are approximate maximum likelihood estimates that are computed 
using Cronos, an open source software written by Anthony Brockwell and 
available at http://www.codeplex.com/cronos. 
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Remark. We set B to 50,000 and take a burn-in period of 5000 in 
Algorithm 7. Convergence is ascertained using trace plots of the posterior 
output. 

Remark. We use the Monte Carlo based approach described in Sec- 
tion 4 to evaluate Pg*(Xj), as we found this to be faster for the purposes of 
our empirical analysis in Algorithm 8. 

A.3. Calculation of posterior distribution of A. We now outline some 
of the calculations that are needed to compute the normal approximation 
[equation (37)] to the posterior distribution of A near the mode of its true 
posterior distribution [equation (36)]. Recall that the mean of the normal 
approximation is the posterior mode. The reciprocal of the variance term is 



1 

V* ~ 
Observe that 



d? log(p(A|ui,...,u L )) 



dX 2 



d 2 log[(5(A))- L /2] 



A=A* 



dX 2 



A=A* 



d 2 log[(S(A))- L / 2 ] 
dX 2 



S"(X)-S(X)-(S'(X)f 
(S(X)) 2 



We approximate the first and second derivative expressions S"(A) and S"(X) 

9 



MS 



S'(X) 



S"(X) 



S(X + Aq) — S(X — A 



G 



and 



2A G 

S(X + A G ) - 25(A) + S(X- A G ) 

A 2 
G 



In order to evaluate the derivative expression at A* , we use the values in Ta- 
ble 11. Once these computations are completed, the normal approximation 
to the posterior distribution of A is completely specified. 

A. 4. Numerical simulation results. Tables 12 and 13 provide the results 
of the numerical experiments, described in Section 4.3, when using an Euler 
discretization from the stochastic volatility model in equations (1), (2) and 
(3). As can be observed from the results in these tables, the option prices 
using the Euler discretization are almost identical to those using the exact 
simulation for all methods described in Section 4.3. Hence, when working 
with models that may not permit exact simulation, first and higher order 
discretization techniques also facilitate option pricing methods such as the 
ones done in this analysis. 



9 Theoretically, the first derivative is equal to at the optimized point [i.e., S"(A*) = 0]. 
The numerical approximation of the first derivative using the expressions above comes 
within tolerance of 0. 
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Table 11 

Values of 5(A) at different points. Recall the values for A* are 
given in Table 9. Additionally, the spacing between the X-grid, 
Ac, equals 0.05 



Equity 


S(A* - Ac) 


S(X*) 


S(\* + A G ) 


Dell 


0.1847 


0.1839 


0.1860 


Disney 


0.3658 


0.3631 


0.3644 


Xerox 


0.02023 


0.02016 


0.02030 
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Table 12 

Euler discretization — American put option pricing results (and compute times) using 
methods A (basic LSM with past share prices) and B (realized volatility as an estimate 
for volatility). Results are very similar to those reported for the exact simulation in 

Table 2 



Experiment no. 


A (basic LSM) 


B (realized volatility) 


1 


3.047 (0.0107) 


3.040 (0.0103) 


Time (sec) 


8 


8 


2 


2.147 (0.00946) 


2.154 (0.00974) 


Time (sec) 


15 


15 


3 


1.213 (0.00759) 


1.230 (0.00841) 


Time (sec) 


13 


11 


4 


3.569 (0.0242) 


4.152 (0.0358) 


Time (sec) 


37 


39 


5 


12.995 (0.0743) 


15.066 (0.120) 


Time (sec) 


36 


42 


6 


18.623 (0.121) 


21.474 (0.168) 


Time (sec) 


44 


44 


7 


1.588 (0.0120) 


1.843 (0.0186) 


Time (sec) 


13 


14 


8 


0.0945 (0.00367) 


0.145 (0.00553) 


Time (sec) 


16 


14 


9 


2.437 (0.0132) 


2.668 (0.0197) 


Time (sec) 


20 


21 
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Algorithm 8 Posterior analysis of the market price of volatility risk 

Initialization 1. Compute the posterior summary from the model estima- 
tion routine outlined in Algorithm 7. (This could be, for example, the 
posterior mean or median of 0.) Denote the posterior summary measure 

Initialization 2. Input the L American option contract features including 
initial share price and initial volatility based on, say, a 10-day historical 
volatility measure. 

Initialization 3. Input a grid of A values that will be used to find the 
optimal value for A. (This can be roughly estimated via trial and error.) 
Denote the number of grid points by G and the A values by Ai, . . . , Xq 
and the distance between each grid point by Ac- 
tor i = 1 , . . . , G do 

• Option valuations. For j = 1,...,L, compute and store the model- 
predicted American option values, P^* (Aj) using the pricing algorithms 
(either Monte Carlo or grid-based described in Section 4). 

• Optimize SSE. Compute the value of the sum of squared errors S(X) 
defined in equation (31). 

end for 

Find optimal A. Find the optimal A value, A*, among the grid points 
(Ai, . . . , Ag) such that 

A* = argminS(A). 

A 

Posterior computation. Starting with the prior specification in equation 
(34), S(\)~ L / 2 is the posterior distribution of A up to a constant of pro- 
portionality. Calculate the approximate posterior distribution by using 
the Gaussian approximation to the posterior distribution near the mode 
[i.e., near A*; see Seber and Wild (2003) or Schervish (1995)]. 

Output. Return the approximate posterior distribution of A from equation 
(37) and summarize accordingly. (Derivative evaluations are evaluated 
numerically using central difference methods.) 



the Remarks Computing Group in the Department of Statistics at Carnegie 
Mellon University. 
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Table 13 

Euler discretization — American put option pricing results (and compute times) using 
methods C (MC/Grid) and D (observable volatility). Results are very similar to those 
reported for the exact simulation in Table 3 



Experiment no. 


C (MC/Grid) 


D (observable volatility) 


1 


3.052 (0.0109) 


3.045 (0.00982) 


Time (sec) 


161 


8 


2 


2.164 (0.00958) 


2.173 (0.00986) 


Time (sec) 


321 


15 


3 


1.257 (0.00900) 


1.261 (0.00902) 


Time (sec) 


212 


10 


4 


4.684 (0.0439) 


4.734 (0.0441) 


Time (sec) 


761 


37 


5 


16.270 (0.138) 


16.330 (0.138) 


Time (sec) 


748 


37 


6 


22.764 (0.183) 


22.999 (0.185) 


Time (sec) 


831 


47 


7 


1.998 (0.0217) 


2.045 (0.0225) 


Time (sec) 


302 


16 


8 


0.169 (0.00691) 


0.172 (0.00704) 


Time (sec) 


236 


15 


9 


2.851 (0.0233) 


2.861 (0.0235) 


Time (sec) 


379 


24 



SUPPLEMENTARY MATERIAL 

Code and data sets (DOI: 10.1214/09-AOAS286SUPP; .zip). Sequential 
Monte Carlo pricing routines. The R code used in our analysis for pricing 
American-style options in a latent stochastic volatility framework as well 
code for optimizing all model parameters, including the market price of 
volatility risk, are part of this supplement. American put option data sets. 
The data sets used in our pricing/estimation analysis include historical share 
prices and American put option prices for three equities: Dell, Disney and 
Xerox. The data files are in this supplement. 
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